Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 17, 2025

Summary

Fixed a bug where balancing groups were incorrectly being removed from negative lookarounds during regex tree reduction, causing patterns like ()(?'-1')(?!(?'-1')) to incorrectly return 0 matches instead of matching at every position.

Root Cause

The issue was introduced by an optimization in the RemoveCaptures method within ReduceLookaround in RegexNode.cs. This method removes capture groups from negative lookarounds since captures inside negative lookarounds are undone after the lookaround completes. However, it was incorrectly removing ALL capture groups, including balancing groups.

Balancing groups (e.g., (?'-1')) have semantic meaning that affects matching behavior - they require a specific capture group to have been captured before they can succeed. Removing them changes the match semantics.

The Fix

Modified the RemoveCaptures method to preserve balancing groups by checking if N != -1 (where N stores the uncapture group number). The code now uses pattern matching for cleaner syntax: if (node is { Kind: RegexNodeKind.Capture, N: -1 }). Updated comments to clarify that captures that don't rely on or impact persisted state can be removed, which includes backreferences and balancing groups.

Changes Made

  1. RegexNode.cs: Modified RemoveCaptures to check node.N == -1 before removing captures, using pattern matching
  2. Regex.Match.Tests.cs: Added test case for balancing groups in negative lookarounds
  3. Regex.Count.Tests.cs: Added count test to verify correct match count

Test Results

  • ✅ All functional tests pass: 29,334 tests
  • ✅ All unit tests pass: 1,005 tests
  • ✅ Pattern ()(?'-1')(?!(?'-1')) now correctly matches at every position in the input string
Original prompt

This section details on the original issue you should resolve

<issue_title>Regex ()(?'-1')(?!(?'-1')) exhibit incorrect matching behavior in .NET10</issue_title>
<issue_description>### Description

Such situation occur in (?!) and (?<!) under Interpreter、Compiled and source GeneratedCode.

Reproduction Steps

using System.Text.RegularExpressions;

var pattern = @"()(?'-1')(?!(?'-1'))";
var input = "such situation occur in (?!) and (?<!) under Interpreter、Compiled and source GeneratedCode";
Regex regex = new Regex(pattern, RegexOptions.Compiled);
var mhes = regex.Matches(input);
Console.WriteLine(Regex.Matches(input, pattern).Count);
Console.WriteLine(mhes.Count);
Console.WriteLine(RegexTester.regexGenerate().Matches(input).Count);
internal partial class RegexTester
{
	[GeneratedRegex(@"()(?'-1')(?!(?'-1'))")]
	internal static partial Regex regexGenerate();
}

Here is part of source Generated Code:

partial class RegexTester
{
    /// <remarks>
    /// Pattern:<br/>
    /// <code>()(?'-1')(?!(?'-1'))</code><br/>
    /// Explanation:<br/>
    /// <code>
    /// ○ Fail to match.<br/>
    /// </code>
    /// </remarks>
    [global::System.CodeDom.Compiler.GeneratedCodeAttribute("System.Text.RegularExpressions.Generator", "10.0.13.307")]
    internal static partial global::System.Text.RegularExpressions.Regex regexGenerate() => global::System.Text.RegularExpressions.Generated.regexGenerate_0.Instance;
}

private sealed class Runner : RegexRunner
{
    /// <summary>Scan the <paramref name="inputSpan"/> starting from base.runtextstart for the next match.</summary>
    /// <param name="inputSpan">The text being scanned by the regular expression.</param>
    protected override void Scan(ReadOnlySpan<char> inputSpan)
    {
        // The pattern never matches anything.
    }
}

Expected behavior

91
91
91

Actual behavior

0
0
0

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response</issue_description>

<agent_instructions>Please add tests for the problem and fix it. I expect the problem was introduced in #118084 and that the fix is simply to ensure that capture groups are only removed from negative lookarounds if they're not part of a balancing group.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@stephentoub Why do you believe the current behavior is incorrect? It's unclear from your description as the Regression section is empty... is your "Expected behavior" from a previous release?

Fixes #120849


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 17, 2025
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Fix regex incorrect matching behavior in .NET10 Fix balancing groups incorrectly removed from negative lookarounds Oct 17, 2025
@Copilot Copilot AI requested a review from stephentoub October 17, 2025 20:30
Copilot finished work on behalf of stephentoub October 17, 2025 20:30
@stephentoub stephentoub marked this pull request as ready for review October 17, 2025 20:35
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where balancing groups inside negative lookarounds were incorrectly removed during regex optimization, causing patterns like ()(?'-1')(?!(?'-1')) to fail to match when they should. The fix preserves balancing groups while still removing regular capture groups from negative lookarounds.

  • Modified RemoveCaptures method to check if a capture node is a balancing group before removing it
  • Added test cases to verify correct matching behavior for balancing groups in negative lookarounds
  • All existing tests continue to pass, confirming the optimization for regular captures still works

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Added check to preserve balancing groups (N != -1) while removing regular captures (N == -1) from negative lookarounds
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Added test case verifying single match behavior with balancing groups in negative lookarounds
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Count.Tests.cs Added test case verifying correct match count for pattern with balancing groups in negative lookarounds

@stephentoub stephentoub added area-System.Text.RegularExpressions and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 17, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@Copilot Copilot AI requested a review from stephentoub October 17, 2025 21:02
Copilot finished work on behalf of stephentoub October 17, 2025 21:02
Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tarekgh tarekgh merged commit 9c700f7 into main Oct 18, 2025
82 of 84 checks passed
@stephentoub
Copy link
Member

/backport to release/10.0-staging

@stephentoub stephentoub deleted the copilot/fix-regex-matching-issue branch October 18, 2025 02:44
Copy link
Contributor

Started backporting to release/10.0-staging: https://github.com/dotnet/runtime/actions/runs/18609604713

Copy link
Contributor

@stephentoub an error occurred while backporting to "release/10.0-staging", please check the run log for details!

Error: The specified backport target branch "release/10.0-staging" wasn't found in the repo.

@hez2010
Copy link
Contributor

hez2010 commented Oct 18, 2025

/backport to release/10.0-staging

i believe there's no release/10.0-staging yet?

@stephentoub
Copy link
Member

stephentoub commented Oct 18, 2025

i believe there's no release/10.0-staging yet?

That's what the message above says, yes.
"Error: The specified backport target branch "release/10.0-staging" wasn't found in the repo."

@stephentoub
Copy link
Member

/backport to release/10.0

Copy link
Contributor

Started backporting to release/10.0: https://github.com/dotnet/runtime/actions/runs/18639594553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regex ()(?'-1')(?!(?'-1')) exhibit incorrect matching behavior in .NET10

4 participants